Overview

Dataset statistics

Number of variables27
Number of observations517737
Missing cells8093719
Missing cells (%)57.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory106.7 MiB
Average record size in memory216.0 B

Variable types

CAT21
NUM4
UNSUPPORTED2

Warnings

BID has a high cardinality: 133980 distinct values High cardinality
StartDt has a high cardinality: 385 distinct values High cardinality
EndDt has a high cardinality: 366 distinct values High cardinality
PID has a high cardinality: 5012 distinct values High cardinality
AttendingPhysician has a high cardinality: 74109 distinct values High cardinality
OperatingPhysician has a high cardinality: 28532 distinct values High cardinality
OtherPhysician has a high cardinality: 44388 distinct values High cardinality
DiagnosisCode_1 has a high cardinality: 10354 distinct values High cardinality
DiagnosisCode_2 has a high cardinality: 5056 distinct values High cardinality
DiagnosisCode_3 has a high cardinality: 4448 distinct values High cardinality
DiagnosisCode_4 has a high cardinality: 3925 distinct values High cardinality
DiagnosisCode_5 has a high cardinality: 3412 distinct values High cardinality
DiagnosisCode_6 has a high cardinality: 2968 distinct values High cardinality
DiagnosisCode_7 has a high cardinality: 2635 distinct values High cardinality
DiagnosisCode_8 has a high cardinality: 2260 distinct values High cardinality
DiagnosisCode_9 has a high cardinality: 1894 distinct values High cardinality
DiagnosisCode_10 has a high cardinality: 495 distinct values High cardinality
AdmitDiagnosisCode has a high cardinality: 3715 distinct values High cardinality
ProcedureCode_4 is highly correlated with AmtReimbursed and 3 other fieldsHigh correlation
AmtReimbursed is highly correlated with ProcedureCode_4High correlation
ProcedureCode_1 is highly correlated with ProcedureCode_4High correlation
ProcedureCode_2 is highly correlated with ProcedureCode_4High correlation
ProcedureCode_3 is highly correlated with ProcedureCode_4High correlation
OperatingPhysician has 427120 (82.5%) missing values Missing
OtherPhysician has 322691 (62.3%) missing values Missing
DiagnosisCode_1 has 10453 (2.0%) missing values Missing
DiagnosisCode_2 has 195380 (37.7%) missing values Missing
DiagnosisCode_3 has 314480 (60.7%) missing values Missing
DiagnosisCode_4 has 392141 (75.7%) missing values Missing
DiagnosisCode_5 has 443393 (85.6%) missing values Missing
DiagnosisCode_6 has 468981 (90.6%) missing values Missing
DiagnosisCode_7 has 484776 (93.6%) missing values Missing
DiagnosisCode_8 has 494825 (95.6%) missing values Missing
DiagnosisCode_9 has 502899 (97.1%) missing values Missing
DiagnosisCode_10 has 516654 (99.8%) missing values Missing
ProcedureCode_1 has 517575 (> 99.9%) missing values Missing
ProcedureCode_2 has 517701 (> 99.9%) missing values Missing
ProcedureCode_3 has 517733 (> 99.9%) missing values Missing
ProcedureCode_4 has 517735 (> 99.9%) missing values Missing
ProcedureCode_5 has 517737 (100.0%) missing values Missing
ProcedureCode_6 has 517737 (100.0%) missing values Missing
AdmitDiagnosisCode has 412312 (79.6%) missing values Missing
AmtReimbursed is highly skewed (γ1 = 34.61136686) Skewed
ProcedureCode_3 is uniformly distributed Uniform
ProcedureCode_4 is uniformly distributed Uniform
CID has unique values Unique
ProcedureCode_5 is an unsupported type, check if it needs cleaning or further analysis Unsupported
ProcedureCode_6 is an unsupported type, check if it needs cleaning or further analysis Unsupported
AmtReimbursed has 19568 (3.8%) zeros Zeros
DeductibleAmt has 496701 (95.9%) zeros Zeros

Reproduction

Analysis started2020-10-13 16:29:29.546891
Analysis finished2020-10-13 16:30:09.800344
Duration40.25 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

BID
Categorical

HIGH CARDINALITY

Distinct133980
Distinct (%)25.9%
Missing0
Missing (%)0.0%
Memory size4.0 MiB
BENE118316
 
29
BENE42721
 
29
BENE59303
 
27
BENE63544
 
27
BENE63504
 
27
Other values (133975)
517598 
ValueCountFrequency (%) 
BENE11831629< 0.1%
 
BENE4272129< 0.1%
 
BENE5930327< 0.1%
 
BENE6354427< 0.1%
 
BENE6350427< 0.1%
 
BENE14340027< 0.1%
 
BENE3633026< 0.1%
 
BENE4424126< 0.1%
 
BENE8724825< 0.1%
 
BENE15837425< 0.1%
 
Other values (133970)51746999.9%
 
2020-10-13T12:30:10.469454image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique33631 ?
Unique (%)6.5%
2020-10-13T12:30:10.681263image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length10
Median length9
Mean length9.400927498
Min length9

CID
Categorical

UNIQUE

Distinct517737
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size4.0 MiB
CLM622424
 
1
CLM674546
 
1
CLM291347
 
1
CLM579979
 
1
CLM287987
 
1
Other values (517732)
517732 
ValueCountFrequency (%) 
CLM6224241< 0.1%
 
CLM6745461< 0.1%
 
CLM2913471< 0.1%
 
CLM5799791< 0.1%
 
CLM2879871< 0.1%
 
CLM3760311< 0.1%
 
CLM3614221< 0.1%
 
CLM3273831< 0.1%
 
CLM4953541< 0.1%
 
CLM4723871< 0.1%
 
Other values (517727)517727> 99.9%
 
2020-10-13T12:30:14.225395image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique517737 ?
Unique (%)100.0%
2020-10-13T12:30:14.392648image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length9
Median length9
Mean length8.99996137
Min length8

StartDt
Categorical

HIGH CARDINALITY

Distinct385
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size4.0 MiB
2009-03-03
 
1574
2009-03-21
 
1567
2009-01-31
 
1566
2009-04-25
 
1550
2009-02-16
 
1549
Other values (380)
509931 
ValueCountFrequency (%) 
2009-03-0315740.3%
 
2009-03-2115670.3%
 
2009-01-3115660.3%
 
2009-04-2515500.3%
 
2009-02-1615490.3%
 
2009-05-0715480.3%
 
2009-03-0815450.3%
 
2009-06-1015440.3%
 
2009-06-0715390.3%
 
2009-05-0115370.3%
 
Other values (375)50221897.0%
 
2020-10-13T12:30:14.611772image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-10-13T12:30:14.828619image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length10
Median length10
Mean length10
Min length10

EndDt
Categorical

HIGH CARDINALITY

Distinct366
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size4.0 MiB
2009-03-03
 
1563
2009-03-21
 
1561
2009-04-23
 
1554
2009-05-01
 
1548
2009-06-20
 
1542
Other values (361)
509969 
ValueCountFrequency (%) 
2009-03-0315630.3%
 
2009-03-2115610.3%
 
2009-04-2315540.3%
 
2009-05-0115480.3%
 
2009-06-2015420.3%
 
2009-03-0815410.3%
 
2009-05-1315400.3%
 
2009-02-1615390.3%
 
2009-01-3115370.3%
 
2009-03-3015360.3%
 
Other values (356)50227697.0%
 
2020-10-13T12:30:15.056947image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique1 ?
Unique (%)< 0.1%
2020-10-13T12:30:15.241823image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length10
Median length10
Mean length10
Min length10

PID
Categorical

HIGH CARDINALITY

Distinct5012
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size4.0 MiB
PRV51459
 
8240
PRV53797
 
4739
PRV51574
 
4444
PRV53918
 
3588
PRV54895
 
3433
Other values (5007)
493293 
ValueCountFrequency (%) 
PRV5145982401.6%
 
PRV5379747390.9%
 
PRV5157444440.9%
 
PRV5391835880.7%
 
PRV5489534330.7%
 
PRV5521532500.6%
 
PRV5601128330.5%
 
PRV5206428060.5%
 
PRV5500423960.5%
 
PRV5730623150.4%
 
Other values (5002)47969392.7%
 
2020-10-13T12:30:15.450347image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique200 ?
Unique (%)< 0.1%
2020-10-13T12:30:15.624012image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length8
Median length8
Mean length8
Min length8

AmtReimbursed
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct342
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean286.3347993
Minimum0
Maximum102500
Zeros19568
Zeros (%)3.8%
Memory size4.0 MiB
2020-10-13T12:30:15.806689image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile10
Q140
median80
Q3200
95-th percentile1500
Maximum102500
Range102500
Interquartile range (IQR)160

Descriptive statistics

Standard deviation694.0343433
Coefficient of variation (CV)2.423856076
Kurtosis4172.177736
Mean286.3347993
Median Absolute Deviation (MAD)50
Skewness34.61136686
Sum148246120
Variance481683.6696
MonotocityNot monotonic
2020-10-13T12:30:16.020790image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1005294310.2%
 
10424618.2%
 
200415948.0%
 
60407627.9%
 
30339196.6%
 
40336166.5%
 
50312936.0%
 
20279605.4%
 
80250954.8%
 
70244124.7%
 
Other values (332)16368231.6%
 
ValueCountFrequency (%) 
0195683.8%
 
10424618.2%
 
20279605.4%
 
30339196.6%
 
40336166.5%
 
ValueCountFrequency (%) 
1025001< 0.1%
 
1012501< 0.1%
 
955801< 0.1%
 
856801< 0.1%
 
846601< 0.1%
 

AttendingPhysician
Categorical

HIGH CARDINALITY

Distinct74109
Distinct (%)14.4%
Missing1396
Missing (%)0.3%
Memory size4.0 MiB
PHY330576
 
2534
PHY350277
 
1628
PHY412132
 
1321
PHY423534
 
1223
PHY314027
 
1200
Other values (74104)
508435 
ValueCountFrequency (%) 
PHY33057625340.5%
 
PHY35027716280.3%
 
PHY41213213210.3%
 
PHY42353412230.2%
 
PHY31402712000.2%
 
PHY32704611810.2%
 
PHY33803211580.2%
 
PHY33742511560.2%
 
PHY35712011560.2%
 
PHY34157811330.2%
 
Other values (74099)50265197.1%
 
(Missing)13960.3%
 
2020-10-13T12:30:16.589545image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique32687 ?
Unique (%)6.3%
2020-10-13T12:30:16.794053image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length9
Median length9
Mean length8.983821902
Min length3

OperatingPhysician
Categorical

HIGH CARDINALITY
MISSING

Distinct28532
Distinct (%)31.5%
Missing427120
Missing (%)82.5%
Memory size4.0 MiB
PHY330576
 
424
PHY424897
 
293
PHY314027
 
256
PHY423534
 
250
PHY357120
 
249
Other values (28527)
89145 
ValueCountFrequency (%) 
PHY3305764240.1%
 
PHY4248972930.1%
 
PHY314027256< 0.1%
 
PHY423534250< 0.1%
 
PHY357120249< 0.1%
 
PHY412132245< 0.1%
 
PHY327046236< 0.1%
 
PHY333735232< 0.1%
 
PHY381249231< 0.1%
 
PHY337425226< 0.1%
 
Other values (28522)8797517.0%
 
(Missing)42712082.5%
 
2020-10-13T12:30:17.122094image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique17159 ?
Unique (%)18.9%
2020-10-13T12:30:17.297912image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length9
Median length3
Mean length4.050150945
Min length3

OtherPhysician
Categorical

HIGH CARDINALITY
MISSING

Distinct44388
Distinct (%)22.8%
Missing322691
Missing (%)62.3%
Memory size4.0 MiB
PHY412132
 
1247
PHY341578
 
1098
PHY338032
 
1070
PHY337425
 
1041
PHY347064
 
806
Other values (44383)
189784 
ValueCountFrequency (%) 
PHY41213212470.2%
 
PHY34157810980.2%
 
PHY33803210700.2%
 
PHY33742510410.2%
 
PHY3470648060.2%
 
PHY3220927710.1%
 
PHY4099657440.1%
 
PHY3138187300.1%
 
PHY3502776820.1%
 
PHY4153216780.1%
 
Other values (44378)18617936.0%
 
(Missing)32269162.3%
 
2020-10-13T12:30:17.675918image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique24312 ?
Unique (%)12.5%
2020-10-13T12:30:17.869372image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length9
Median length3
Mean length5.260367716
Min length3

DiagnosisCode_1
Categorical

HIGH CARDINALITY
MISSING

Distinct10354
Distinct (%)2.0%
Missing10453
Missing (%)2.0%
Memory size4.0 MiB
4019
 
13803
4011
 
12512
2724
 
3603
2720
 
3209
2722
 
3028
Other values (10349)
471129 
ValueCountFrequency (%) 
4019138032.7%
 
4011125122.4%
 
272436030.7%
 
272032090.6%
 
272230280.6%
 
272129980.6%
 
272329950.6%
 
7865122510.4%
 
7865921810.4%
 
7865021790.4%
 
Other values (10344)45852588.6%
 
(Missing)104532.0%
 
2020-10-13T12:30:18.992605image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique1212 ?
Unique (%)0.2%
2020-10-13T12:30:19.185284image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length5
Median length4
Mean length4.455167392
Min length3

DiagnosisCode_2
Categorical

HIGH CARDINALITY
MISSING

Distinct5056
Distinct (%)1.6%
Missing195380
Missing (%)37.7%
Memory size4.0 MiB
4019
 
19894
25000
 
10674
2724
 
10147
V5869
 
9573
V5861
 
9550
Other values (5051)
262519 
ValueCountFrequency (%) 
4019198943.8%
 
25000106742.1%
 
2724101472.0%
 
V586995731.8%
 
V586195501.8%
 
244950901.0%
 
4273150521.0%
 
272047500.9%
 
401145790.9%
 
2852140630.8%
 
Other values (5046)23898546.2%
 
(Missing)19538037.7%
 
2020-10-13T12:30:19.412022image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique1208 ?
Unique (%)0.4%
2020-10-13T12:30:19.612723image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length5
Median length4
Mean length3.917852114
Min length3

DiagnosisCode_3
Categorical

HIGH CARDINALITY
MISSING

Distinct4448
Distinct (%)2.2%
Missing314480
Missing (%)60.7%
Memory size4.0 MiB
4019
 
12126
25000
 
6838
2724
 
6271
V5869
 
6002
V5861
 
4028
Other values (4443)
167992 
ValueCountFrequency (%) 
4019121262.3%
 
2500068381.3%
 
272462711.2%
 
V586960021.2%
 
V586140280.8%
 
244932380.6%
 
272030970.6%
 
4273129990.6%
 
401128440.5%
 
2852126540.5%
 
Other values (4438)15316029.6%
 
(Missing)31448060.7%
 
2020-10-13T12:30:19.844130image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique1115 ?
Unique (%)0.5%
2020-10-13T12:30:20.042015image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length5
Median length3
Mean length3.577671675
Min length3

DiagnosisCode_4
Categorical

HIGH CARDINALITY
MISSING

Distinct3925
Distinct (%)3.1%
Missing392141
Missing (%)75.7%
Memory size4.0 MiB
4019
 
7088
25000
 
4235
2724
 
3736
V5869
 
3300
2449
 
1942
Other values (3920)
105295 
ValueCountFrequency (%) 
401970881.4%
 
2500042350.8%
 
272437360.7%
 
V586933000.6%
 
244919420.4%
 
272019400.4%
 
V586117500.3%
 
4273116910.3%
 
401116440.3%
 
5308114490.3%
 
Other values (3915)9682118.7%
 
(Missing)39214175.7%
 
2020-10-13T12:30:20.263737image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique1134 ?
Unique (%)0.9%
2020-10-13T12:30:20.464467image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length5
Median length3
Mean length3.35537155
Min length3

DiagnosisCode_5
Categorical

HIGH CARDINALITY
MISSING

Distinct3412
Distinct (%)4.6%
Missing443393
Missing (%)85.6%
Memory size4.0 MiB
4019
 
4116
25000
 
2473
2724
 
1945
V5869
 
1852
2449
 
1081
Other values (3407)
62877 
ValueCountFrequency (%) 
401941160.8%
 
2500024730.5%
 
272419450.4%
 
V586918520.4%
 
244910810.2%
 
272010690.2%
 
530819690.2%
 
V58619410.2%
 
427319380.2%
 
40118820.2%
 
Other values (3402)5807811.2%
 
(Missing)44339385.6%
 
2020-10-13T12:30:20.693442image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique1049 ?
Unique (%)1.4%
2020-10-13T12:30:20.862333image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length5
Median length3
Mean length3.211261702
Min length3

DiagnosisCode_6
Categorical

HIGH CARDINALITY
MISSING

Distinct2968
Distinct (%)6.1%
Missing468981
Missing (%)90.6%
Memory size4.0 MiB
4019
 
2550
25000
 
1595
2724
 
1169
V5869
 
1106
2720
 
695
Other values (2963)
41641 
ValueCountFrequency (%) 
401925500.5%
 
2500015950.3%
 
272411690.2%
 
V586911060.2%
 
27206950.1%
 
24496850.1%
 
427316490.1%
 
530816220.1%
 
4965730.1%
 
V58615700.1%
 
Other values (2958)385427.4%
 
(Missing)46898190.6%
 
2020-10-13T12:30:21.062599image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique990 ?
Unique (%)2.0%
2020-10-13T12:30:21.244729image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length5
Median length3
Mean length3.138778955
Min length3

DiagnosisCode_7
Categorical

HIGH CARDINALITY
MISSING

Distinct2635
Distinct (%)8.0%
Missing484776
Missing (%)93.6%
Memory size4.0 MiB
4019
 
1612
25000
 
1003
2724
 
733
V5869
 
717
2720
 
502
Other values (2630)
28394 
ValueCountFrequency (%) 
401916120.3%
 
2500010030.2%
 
27247330.1%
 
V58697170.1%
 
27205020.1%
 
24494360.1%
 
530814310.1%
 
427314180.1%
 
4963910.1%
 
42803770.1%
 
Other values (2625)263415.1%
 
(Missing)48477693.6%
 
2020-10-13T12:30:21.460407image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique966 ?
Unique (%)2.9%
2020-10-13T12:30:21.657770image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length5
Median length3
Mean length3.09373485
Min length3

DiagnosisCode_8
Categorical

HIGH CARDINALITY
MISSING

Distinct2260
Distinct (%)9.9%
Missing494825
Missing (%)95.6%
Memory size4.0 MiB
4019
 
1057
25000
 
702
2724
 
516
V5869
 
471
2720
 
325
Other values (2255)
19841 
ValueCountFrequency (%) 
401910570.2%
 
250007020.1%
 
27245160.1%
 
V58694710.1%
 
27203250.1%
 
24493130.1%
 
530812970.1%
 
427312800.1%
 
4962770.1%
 
V58612620.1%
 
Other values (2250)184123.6%
 
(Missing)49482595.6%
 
2020-10-13T12:30:21.869665image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique844 ?
Unique (%)3.7%
2020-10-13T12:30:22.053369image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length5
Median length3
Mean length3.065114141
Min length3

DiagnosisCode_9
Categorical

HIGH CARDINALITY
MISSING

Distinct1894
Distinct (%)12.8%
Missing502899
Missing (%)97.1%
Memory size4.0 MiB
4019
 
616
25000
 
468
V5869
 
292
2724
 
289
2720
 
250
Other values (1889)
12923 
ValueCountFrequency (%) 
40196160.1%
 
250004680.1%
 
V58692920.1%
 
27242890.1%
 
2720250< 0.1%
 
53081208< 0.1%
 
496185< 0.1%
 
2449184< 0.1%
 
V5861183< 0.1%
 
4280171< 0.1%
 
Other values (1884)119922.3%
 
(Missing)50289997.1%
 
2020-10-13T12:30:22.264804image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique759 ?
Unique (%)5.1%
2020-10-13T12:30:22.448451image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length5
Median length3
Mean length3.042253113
Min length3

DiagnosisCode_10
Categorical

HIGH CARDINALITY
MISSING

Distinct495
Distinct (%)45.7%
Missing516654
Missing (%)99.8%
Memory size4.0 MiB
4019
 
41
25000
 
35
2720
 
17
V5869
 
16
42731
 
15
Other values (490)
959 
ValueCountFrequency (%) 
401941< 0.1%
 
2500035< 0.1%
 
272017< 0.1%
 
V586916< 0.1%
 
4273115< 0.1%
 
5308115< 0.1%
 
272414< 0.1%
 
244913< 0.1%
 
305113< 0.1%
 
E849013< 0.1%
 
Other values (485)8910.2%
 
(Missing)51665499.8%
 
2020-10-13T12:30:22.729686image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique323 ?
Unique (%)29.8%
2020-10-13T12:30:22.919082image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length5
Median length3
Mean length3.003159906
Min length3

ProcedureCode_1
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct80
Distinct (%)49.4%
Missing517575
Missing (%)> 99.9%
Infinite0
Infinite (%)0.0%
Mean6116.611111
Minimum51
Maximum9999
Zeros0
Zeros (%)0.0%
Memory size4.0 MiB
2020-10-13T12:30:23.100645image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum51
5-th percentile70.25
Q13893
median5244.5
Q39421.5
95-th percentile9952
Maximum9999
Range9948
Interquartile range (IQR)5528.5

Descriptive statistics

Standard deviation3217.719258
Coefficient of variation (CV)0.5260624223
Kurtosis-1.101239475
Mean6116.611111
Median Absolute Deviation (MAD)2908.5
Skewness-0.2939338603
Sum990891
Variance10353717.22
MonotocityNot monotonic
2020-10-13T12:30:23.293684image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
990415< 0.1%
 
37228< 0.1%
 
45168< 0.1%
 
667< 0.1%
 
51237< 0.1%
 
99525< 0.1%
 
96725< 0.1%
 
38935< 0.1%
 
86224< 0.1%
 
39954< 0.1%
 
Other values (70)94< 0.1%
 
(Missing)517575> 99.9%
 
ValueCountFrequency (%) 
512< 0.1%
 
667< 0.1%
 
1511< 0.1%
 
2391< 0.1%
 
3111< 0.1%
 
ValueCountFrequency (%) 
99991< 0.1%
 
99611< 0.1%
 
99554< 0.1%
 
99525< 0.1%
 
99291< 0.1%
 

ProcedureCode_2
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct22
Distinct (%)61.1%
Missing517701
Missing (%)> 99.9%
Infinite0
Infinite (%)0.0%
Mean4503.277778
Minimum412
Maximum9982
Zeros0
Zeros (%)0.0%
Memory size4.0 MiB
2020-10-13T12:30:23.459159image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum412
5-th percentile496
Q12724
median4019
Q35849
95-th percentile8389.25
Maximum9982
Range9570
Interquartile range (IQR)3125

Descriptive statistics

Standard deviation2504.015
Coefficient of variation (CV)0.556042759
Kurtosis-0.3033716551
Mean4503.277778
Median Absolute Deviation (MAD)1295
Skewness0.4816786953
Sum162118
Variance6270091.121
MonotocityNot monotonic
2020-10-13T12:30:23.625299image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%) 
40196< 0.1%
 
27246< 0.1%
 
17412< 0.1%
 
4962< 0.1%
 
58492< 0.1%
 
78202< 0.1%
 
38111< 0.1%
 
27311< 0.1%
 
44391< 0.1%
 
45711< 0.1%
 
Other values (12)12< 0.1%
 
(Missing)517701> 99.9%
 
ValueCountFrequency (%) 
4121< 0.1%
 
4962< 0.1%
 
17412< 0.1%
 
27246< 0.1%
 
27311< 0.1%
 
ValueCountFrequency (%) 
99821< 0.1%
 
99711< 0.1%
 
78621< 0.1%
 
78401< 0.1%
 
78202< 0.1%
 

ProcedureCode_3
Categorical

HIGH CORRELATION
MISSING
UNIFORM

Distinct4
Distinct (%)100.0%
Missing517733
Missing (%)> 99.9%
Memory size4.0 MiB
412
2724
4401
4299
ValueCountFrequency (%) 
4121< 0.1%
 
27241< 0.1%
 
44011< 0.1%
 
42991< 0.1%
 
(Missing)517733> 99.9%
 
2020-10-13T12:30:23.798424image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique4 ?
Unique (%)100.0%
2020-10-13T12:30:23.901127image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-13T12:30:24.029625image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length6
Median length3
Mean length3.000021246
Min length3

ProcedureCode_4
Categorical

HIGH CORRELATION
MISSING
UNIFORM

Distinct2
Distinct (%)100.0%
Missing517735
Missing (%)> 99.9%
Memory size4.0 MiB
7840
311
ValueCountFrequency (%) 
78401< 0.1%
 
3111< 0.1%
 
(Missing)517735> 99.9%
 
2020-10-13T12:30:24.187101image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique2 ?
Unique (%)100.0%
2020-10-13T12:30:24.273705image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-13T12:30:24.390524image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length6
Median length3
Mean length3.000009657
Min length3

ProcedureCode_5
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing517737
Missing (%)100.0%
Memory size4.0 MiB

ProcedureCode_6
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing517737
Missing (%)100.0%
Memory size4.0 MiB

DeductibleAmt
Real number (ℝ≥0)

ZEROS

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.779233472
Minimum0
Maximum897
Zeros496701
Zeros (%)95.9%
Memory size4.0 MiB
2020-10-13T12:30:24.526714image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum897
Range897
Interquartile range (IQR)0

Descriptive statistics

Standard deviation15.78583932
Coefficient of variation (CV)5.67992559
Kurtosis180.6903587
Mean2.779233472
Median Absolute Deviation (MAD)0
Skewness8.735340013
Sum1438912
Variance249.1927229
MonotocityNot monotonic
2020-10-13T12:30:24.676136image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%) 
049670195.9%
 
10045820.9%
 
7024200.5%
 
6020650.4%
 
4020450.4%
 
8020240.4%
 
5019690.4%
 
2014060.3%
 
3013360.3%
 
9012450.2%
 
Other values (6)19440.4%
 
ValueCountFrequency (%) 
049670195.9%
 
1012030.2%
 
2014060.3%
 
3013360.3%
 
4020450.4%
 
ValueCountFrequency (%) 
8972< 0.1%
 
8861< 0.1%
 
8762< 0.1%
 
8652< 0.1%
 
2007340.1%
 

AdmitDiagnosisCode
Categorical

HIGH CARDINALITY
MISSING

Distinct3715
Distinct (%)3.5%
Missing412312
Missing (%)79.6%
Memory size4.0 MiB
V7612
 
4074
42731
 
3001
4019
 
2627
25000
 
2346
V5883
 
1871
Other values (3710)
91506 
ValueCountFrequency (%) 
V761240740.8%
 
4273130010.6%
 
401926270.5%
 
2500023460.5%
 
V588318710.4%
 
729516160.3%
 
7890015500.3%
 
V586115360.3%
 
272415060.3%
 
724214320.3%
 
Other values (3705)8386616.2%
 
(Missing)41231279.6%
 
2020-10-13T12:30:24.898574image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Frequencies of value counts

Unique

Unique1106 ?
Unique (%)1.0%
2020-10-13T12:30:25.105654image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length5
Median length3
Mean length3.311370445
Min length3

Interactions

2020-10-13T12:29:52.318902image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-13T12:29:52.452084image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-13T12:29:52.566902image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-13T12:29:52.676895image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-13T12:29:52.787832image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-13T12:29:52.902527image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-13T12:29:53.021305image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-13T12:29:53.135160image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-13T12:29:53.241489image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-13T12:29:53.348928image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-13T12:29:53.468568image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-13T12:29:53.576534image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-13T12:29:53.685598image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-13T12:29:53.789548image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-13T12:29:53.898004image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-13T12:29:54.008125image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-10-13T12:30:25.288641image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-10-13T12:30:25.515180image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-10-13T12:30:25.747697image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-10-13T12:30:25.956033image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-10-13T12:30:26.132773image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-10-13T12:29:56.472579image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-13T12:29:59.311307image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-13T12:30:06.721759image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-13T12:30:07.974232image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

BIDCIDStartDtEndDtPIDAmtReimbursedAttendingPhysicianOperatingPhysicianOtherPhysicianDiagnosisCode_1DiagnosisCode_2DiagnosisCode_3DiagnosisCode_4DiagnosisCode_5DiagnosisCode_6DiagnosisCode_7DiagnosisCode_8DiagnosisCode_9DiagnosisCode_10ProcedureCode_1ProcedureCode_2ProcedureCode_3ProcedureCode_4ProcedureCode_5ProcedureCode_6DeductibleAmtAdmitDiagnosisCode
0BENE11002CLM6243492009-10-112009-10-11PRV5601130PHY326117NaNNaN78943V5866V1272NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN056409
1BENE11003CLM1899472009-02-122009-02-12PRV5761080PHY362868NaNNaN6115NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN079380
2BENE11003CLM4380212009-06-272009-06-27PRV5759510PHY328821NaNNaN2723NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0NaN
3BENE11004CLM1218012009-01-062009-01-06PRV5601140PHY334319NaNNaN71988NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0NaN
4BENE11004CLM1509982009-01-222009-01-22PRV56011200PHY403831NaNNaN82382300007288742807197V4577NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN071947
5BENE11004CLM1732242009-02-032009-02-03PRV5601120PHY339887NaNNaN20381NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0NaN
6BENE11004CLM2247412009-03-032009-03-03PRV5601140PHY345721NaNNaNV654642802449V854NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0NaN
7BENE11004CLM2525122009-03-182009-03-18PRV56011200PHY346833NaNPHY3468337229072457194571695NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0NaN
8BENE11004CLM3226832009-04-252009-05-15PRV5601160PHY372925NaNPHY311407718567265V125472957275140199597844971596NaNNaNNaNNaNNaNNaNNaN0NaN
9BENE11004CLM3395002009-05-042009-05-16PRV56011500PHY412904NaNPHY3964737237NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0NaN

Last rows

BIDCIDStartDtEndDtPIDAmtReimbursedAttendingPhysicianOperatingPhysicianOtherPhysicianDiagnosisCode_1DiagnosisCode_2DiagnosisCode_3DiagnosisCode_4DiagnosisCode_5DiagnosisCode_6DiagnosisCode_7DiagnosisCode_8DiagnosisCode_9DiagnosisCode_10ProcedureCode_1ProcedureCode_2ProcedureCode_3ProcedureCode_4ProcedureCode_5ProcedureCode_6DeductibleAmtAdmitDiagnosisCode
517727BENE159198CLM2552682009-03-192009-03-19PRV5367270PHY317739PHY317739PHY4238865929NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0NaN
517728BENE159198CLM2756042009-03-302009-04-19PRV5369950PHY380182NaNNaN71899733927151673342NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN071946
517729BENE159198CLM3107202009-04-182009-05-08PRV536700PHY329971NaNNaN29561V5869NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN029570
517730BENE159198CLM3477782009-05-082009-05-08PRV5367680PHY361063NaNNaN30279V5869NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0NaN
517731BENE159198CLM4003952009-06-062009-06-06PRV53699100PHY380182NaNPHY38575229212NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0NaN
517732BENE159198CLM5107922009-08-062009-08-06PRV53699800PHY364188PHY364188PHY3857522163V457553190NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0NaN
517733BENE159198CLM5512942009-08-292009-08-29PRV53702400PHY423019PHY332284NaN07041578125000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0NaN
517734BENE159198CLM5964442009-09-242009-09-24PRV5367660PHY361063NaNNaNV57078079NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0NaN
517735BENE159198CLM6369922009-10-182009-10-18PRV5368970PHY403198NaNPHY419379NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0NaN
517736BENE159198CLM6861392009-11-172009-11-18PRV5368980PHY419379NaNPHY41937978900786094280719463310753112724V103NaNNaNNaNNaNNaNNaNNaNNaN0NaN